Evolving Consensus Sequences with a Genetic Algorithm
نویسندگان
چکیده
In this paper we present an approach that employs a genetic algorithm (GA) to evolve the consensus sequences for DNA and protein sequences alignments. We have developed an encoding scheme such that the number of generations needed to find the optimal solution remains approximately the same regardless the number of sequences. The complexity instead depends only on the length of the consensus sequence and similarities among sequences. The objective function gives the sum-of-pairs (SP) scores that serve as the fitness values. We have devised a residue profiling technique that further simplifies the calculations of the SP scores. Furthermore, to facilitate the quantitative studies of our GA approach, we have developed a simulation program that incorporates the most commonly used evolutionary models and generates biologically sound sequences. We performed several experiments and compared the results with the most commonly used heuristic alignment program Clustal W [18, 19]. We conclude our research with detailed analysis and demonstrate that our GA approach offers an attractive and competitive alternative to the heuristic approach.
منابع مشابه
Evolving Consensus Sequence for Multiple Sequence Alignment with a Genetic Algorithm
In this paper we present an approach that evolves the consensus sequence [25] for multiple sequence alignment (MSA) with genetic algorithm (GA). We have developed an encoding scheme such that the number of generations needed to find the optimal solution is approximately the same regardless the number of sequences. Instead it only depends on the length of the template and similarity between sequ...
متن کاملAutomated Discovery of Protein Motifs With Genetic Programming
Automated methods of machine learning may prove to be useful in discovering biologically meaningful information hidden in the rapidly growing databases of DNA sequences and protein sequences. Genetic programming is an extension of the genetic algorithm in which a population of computer programs is bred, over a series of generations, in order to solve a problem. Genetic programming is capable of...
متن کاملMitochondrial DNA variation in wild and hatchery populations of northern pike, Esox lucius L.
Esox lucius is an economically important freshwater species. Mitochondrial cytb, 12SrRNA, and 16SrRNA gene sequences were used in order to clarify the genetic variation and population structure in three E. Lucius populations, i.e., one Wild population (W) and two hatchery populations (Hatchery Population I-HPI and Hatchery Population II-HPII). A total of 55 individuals, with 19 from wild and 1...
متن کاملMitochondrial DNA variation in wild and hatchery populations of northern pike, Esox lucius L.
Esox lucius is an economically important freshwater species. Mitochondrial cytb, 12SrRNA, and 16SrRNA gene sequences were used in order to clarify the genetic variation and population structure in three E. Lucius populations, i.e., one Wild population (W) and two hatchery populations (Hatchery Population I-HPI and Hatchery Population II-HPII). A total of 55 individuals, with 19 from wild and 1...
متن کاملAn Effective Hybrid Genetic Algorithm for Hybrid Flow Shops with Sequence Dependent Setup Times and Processor Blocking
Hybrid flow-shop or flexible flow shop problems have remained subject of intensive research over several years. Hybrid flow-shop problems overcome one of the limitations of the classical flow-shop model by allowing parallel processors at each stage of task processing. In many papers the assumptions are generally made that there is unlimited storage available between stages and the setup times a...
متن کامل